abstract word
How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition
Yao, Yao, Yang, Yifei, Ma, Xinbei, Yang, Dongjie, Zhang, Zhuosheng, Li, Zuchao, Zhao, Hai
How human cognitive abilities are formed has long captivated researchers. However, a significant challenge lies in developing meaningful methods to measure these complex processes. With the advent of large language models (LLMs), which now rival human capabilities in various domains, we are presented with a unique testbed to investigate human cognition through a new lens. Among the many facets of cognition, one particularly crucial aspect is the concept of semantic size, the perceived magnitude of both abstract and concrete words or concepts. This study seeks to investigate whether LLMs exhibit similar tendencies in understanding semantic size, thereby providing insights into the underlying mechanisms of human cognition. We begin by exploring metaphorical reasoning, comparing how LLMs and humans associate abstract words with concrete objects of varying sizes. Next, we examine LLMs' internal representations to evaluate their alignment with human cognitive processes. Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding, suggesting that real-world, multi-modal experiences are similarly vital for human cognitive development. Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario. The results show that multi-modal LLMs are more emotionally engaged in decision-making, but this also introduces potential biases, such as the risk of manipulation through clickbait headlines. Ultimately, this study offers a novel perspective on how LLMs interpret and internalize language, from the smallest concrete objects to the most profound abstract concepts like love. The insights gained not only improve our understanding of LLMs but also provide new avenues for exploring the cognitive abilities that define human intelligence.
Do not think pink elephant!
Hwang, Kyomin, Kim, Suyoung, Lee, JunHoo, Kwak, Nojun
Large Models (LMs) have heightened expectations for the potential of general AI as they are akin to human intelligence. This paper shows that recent large models such as Stable Diffusion and DALL-E3 also share the vulnerability of human intelligence, namely the "white bear phenomenon". We investigate the causes of the white bear phenomenon by analyzing their representation space. Based on this analysis, we propose a simple prompt-based attack method, which generates figures prohibited by the LM provider's policy. To counter these attacks, we introduce prompt-based defense strategies inspired by cognitive therapy techniques, successfully mitigating attacks by up to 48.22\%.
- Europe > Ukraine (0.14)
- Asia > Russia (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Health & Medicine > Consumer Health (0.49)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
Patterns of Closeness and Abstractness in Colexifications: The Case of Indigenous Languages in the Americas
Colexification refers to linguistic phenomena where multiple concepts (meanings) are expressed by the same lexical form, such as polysemy or homophony. Colexifications have been found to be pervasive across languages and cultures. The problem of concreteness/abstractness of concepts is interdisciplinary, studied from a cognitive standpoint in linguistics, psychology, psycholinguistics, neurophysiology, etc. In this paper, we hypothesize that concepts that are closer in concreteness/abstractness are more likey to colexify, and we test the hypothesis across indigenous languages in Americas.
- Europe > Denmark > North Jutland > Aalborg (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.05)
- North America > United States > Washington > King County > Seattle (0.05)
- (7 more...)
Does Conceptual Representation Require Embodiment? Insights From Large Language Models
Xu, Qihui, Peng, Yingying, Nastase, Samuel A., Chodorow, Martin, Wu, Minghua, Li, Ping
To what extent can language alone give rise to complex concepts, or is embodied experience essential? Recent advancements in large language models (LLMs) offer fresh perspectives on this question. Although LLMs are trained on restricted modalities, they exhibit human-like performance in diverse psychological tasks. Our study compared representations of 4,442 lexical concepts between humans and ChatGPTs (GPT-3.5 and GPT-4) across multiple dimensions, including five key domains: emotion, salience, mental visualization, sensory, and motor experience. We identify two main findings: 1) Both models strongly align with human representations in non-sensorimotor domains but lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5; 2) GPT-4's gains are associated with its additional visual learning, which also appears to benefit related dimensions like haptics and imageability. These results highlight the limitations of language in isolation, and that the integration of diverse modalities of inputs leads to a more human-like conceptual representation.
- Asia > China > Hong Kong (0.04)
- North America > United States > New York (0.04)
How direct is the link between words and images?
Shahmohammadi, Hassan, Heitmeier, Maria, Shafaei-Bajestan, Elnaz, Lensch, Hendrik P. A., Baayen, Harald
Current word embedding models despite their success, still suffer from their lack of grounding in the real world. In this line of research, Gunther et al. 2022 proposed a behavioral experiment to investigate the relationship between words and images. In their setup, participants were presented with a target noun and a pair of images, one chosen by their model and another chosen randomly. Participants were asked to select the image that best matched the target noun. In most cases, participants preferred the image selected by the model. Gunther et al., therefore, concluded the possibility of a direct link between words and embodied experience. We took their experiment as a point of departure and addressed the following questions. 1. Apart from utilizing visually embodied simulation of given images, what other strategies might subjects have used to solve this task? To what extent does this setup rely on visual information from images? Can it be solved using purely textual representations? 2. Do current visually grounded embeddings explain subjects' selection behavior better than textual embeddings? 3. Does visual grounding improve the semantic representations of both concrete and abstract words? To address these questions, we designed novel experiments by using pre-trained textual and visually grounded word embeddings. Our experiments reveal that subjects' selection behavior is explained to a large extent based on purely text-based embeddings and word-based similarities, suggesting a minor involvement of active embodied experiences. Visually grounded embeddings offered modest advantages over textual embeddings only in certain cases. These findings indicate that the experiment by Gunther et al. may not be well suited for tapping into the perceptual experience of participants, and therefore the extent to which it measures visually grounded knowledge is unclear.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Africa > Kenya > Mandera County > Mandera (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Language with Vision: a Study on Grounded Word and Sentence Embeddings
Shahmohammadi, Hassan, Heitmeier, Maria, Shafaei-Bajestan, Elnaz, Lensch, Hendrik P. A., Baayen, Harald
Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many attempts at language grounding, achieving an optimal equilibrium between textual representations of the language and our embodied experiences remains an open field. Some common concerns are the following. Is visual grounding advantageous for abstract words, or is its effectiveness restricted to concrete words? What is the optimal way of bridging the gap between text and vision? To what extent is perceptual knowledge from images advantageous for acquiring high-quality embeddings? Leveraging the current advances in machine learning and natural language processing, the present study addresses these questions by proposing a simple yet very effective computational grounding model for pre-trained word embeddings. Our model effectively balances the interplay between language and vision by aligning textual embeddings with visual information while simultaneously preserving the distributional statistics that characterize word usage in text corpora. By applying a learned alignment, we are able to indirectly ground unseen words including abstract words. A series of evaluations on a range of behavioural datasets shows that visual grounding is beneficial not only for concrete words but also for abstract words, lending support to the indirect theory of abstract concepts. Moreover, our approach offers advantages for contextualized embeddings, such as those generated by BERT, but only when trained on corpora of modest, cognitively plausible sizes. Code and grounded embeddings for English are available at https://github.com/Hazel1994/Visually_Grounded_Word_Embeddings_2.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Africa > Kenya > Mandera County > Mandera (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (23 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Education (1.00)
- Government (0.92)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
ReCAM@IITK at SemEval-2021 Task 4: BERT and ALBERT based Ensemble for Abstract Word Prediction
Mittal, Abhishek, Modi, Ashutosh
This paper describes our system for Task 4 of SemEval-2021: Reading Comprehension of Abstract Meaning (ReCAM). We participated in all subtasks where the main goal was to predict an abstract word missing from a statement. We fine-tuned the pre-trained masked language models namely BERT and ALBERT and used an Ensemble of these as our submitted system on Subtask 1 (ReCAM-Imperceptibility) and Subtask 2 (ReCAM-Nonspecificity). For Subtask 3 (ReCAM-Intersection), we submitted the ALBERT model as it gives the best results. We tried multiple approaches and found that Masked Language Modeling(MLM) based approach works the best.
- Oceania > Australia (0.04)
- Europe > United Kingdom > Scotland (0.04)
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract Meaning
Xie, Xin, Chen, Xiangnan, Chen, Xiang, Wang, Yong, Zhang, Ningyu, Deng, Shumin, Chen, Huajun
This paper presents our systems for the three Subtasks of SemEval Task4: Reading Comprehension of Abstract Meaning (ReCAM). We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model. Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model. Evaluation results demonstrate the effectiveness of our proposed approach. Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively. We further conduct comprehensive model analysis and observe interesting error cases, which may promote future researches.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > Wales (0.04)
- (11 more...)
- Leisure & Entertainment > Sports (0.93)
- Education > Assessment & Standards > Student Performance (0.61)
Learning Multimodal Word Representation via Dynamic Fusion Methods
Wang, Shaonan (Institute of Automation, Chinese Academy of Sciences) | Zhang, Jiajun (Institute of Automation, Chinese Academy of Sciences) | Zong, Chengqing (Institute of Automation, Chinese Academy of Sciences)
Multimodal models have been proven to outperform text-based models on learning semantic word representations. Almost all previous multimodal models typically treat the representations from different modalities equally. However, it is obvious that information from different modalities contributes differently to the meaning of words. This motivates us to build a multimodal model that can dynamically fuse the semantic representations from different modalities according to different types of words. To that end, we propose three novel dynamic fusion methods to assign importance weights to each modality, in which weights are learned under the weak supervision of word association pairs. The extensive experiments have demonstrated that the proposed methods outperform strong unimodal baselines and state-of-the-art multimodal models.
A Kids' Open Mind Common Sense
Bosch, Antal van den (Tilburg University) | Nauts, Pim (Tilburg University) | Eckhardt, Nienke (Tilburg University)
We propose a collaborative approach to the issue of resource creation for commonsense computing by developing a collaboratory application aimed at children. Human validation is enabled through a game-with-a-purpose (GWAP) interface, gathering reliability judgements of assertions that can be used to aid the process of resource validation. Our experiments confirm that children aged 10 to 12 can be valuable and reliable partners in building commonsense databases, due to their stage of mental development and their eagerness to play GWAPs. Results show that children adapt their word choice in the assertions they provide to the difficulty level of the stimuli words, and that the judgements gathered through in-game validation can help to validate about 30% of the gathered statements automatically.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)